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Abstract.  In  this  paper  we  describe  the  application  of  a  parallel  implementation  of  the  implicit  filtering  algorithm 
to  a  control  problem  from  hydrology.  We  seek  to  control  the  temperature  at  a  group  of  drinking  water  wells  by  placing 
barrier  wells  between  the  drinking  water  wells  and  a  well  that  injects  heated  water  from  an  industrial  site. 
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1.  Introduction.  The  objective  of  this  paper  is  to  show  how  the  implicit  filtering  algorithm 
[11, 15]  for  noisy  optimization  problems  can  be  applied  to  optimization  problems  in  hydrology. 
We  focus  on  a  groundwater  temperature  control  problem.  This  problem  has  some  of  the  impor¬ 
tant  difficulties,  such  as  nonconvexity  and  nonsmoothness,  that  one  would  expect  in  more  difficult 
cases,  but  can  use  flow  and  transport  models  and  formulations  of  the  optimization  problem  that  are 
sufficiently  simple  to  allow  for  a  complete  description  in  a  single  paper.  More  difficult  problems, 
with  coupled  flow  and  transport,  temperature  dependent  densities  and  viscosities,  three  dimen¬ 
sional  geometries,  and  more  complex  flow  and  transport  equations,  will  be  considered  in  future 
work. 

In  this  paper,  we  solve  the  subsurface  flow  control  problem  with  a  parallel  implementation  [3] 
of  the  implicit  filtering  algorithm  [10, 11, 15].  Implicit  filtering  is  a  sampling  method  for  optimiza¬ 
tion  of  noisy  functions.  The  problem  has  simple  bound  constraints  and  four  optimization  variables. 
The  objective  function  is  nonconvex,  nonsmooth,  and  has  several  local  minima.  The  optimization 
landscape  in  Figure  1.1  is  a  plot  of  the  objective  function  with  two  of  the  variables  set  to  zero. 

We  begin  in  §  2  by  briefly  discussing  the  groundwater  flow  and  transport  models  used  in  this 
work  and  by  formulating  the  control  problem. 

In  §  3  we  review  the  implicit  filtering  algorithm  and  its  implementation  in  parallel.  Then  in 
§  4  we  report  on  the  results  of  the  optimization  and  the  parallel  performance. 

2.  Groundwater  Temperature  Control.  The  problem  we  consider  in  this  paper  was  given 
to  us  by  TGU  (Technologieberatung  Grundwasser  und  Umwelt)  GmbH,  a  consulting  engineering 
company  for  groundwater  and  water  resources.  We  wish  to  control  the  temperature  in  a  set  of 
drinking  water  wells.  The  site  shown  in  Figure  2.1  is  in  the  recharge  region  for  these  wells.  There 
is  an  industrial  zone  on  the  right  of  the  shaded  region  which  injects  heated  water  in  a  single  well, 
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FlG.  1.1.  Optimization  Landscape 
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the  infiltration  well.  German  law  (the  Wasserhaushaltsgesetz)  requires  that  anthropogenic  changes 
of  groundwater  properties  be  minimized.  In  this  regulation  is  the  requirement  that  drinking  water 
be  provided  at  the  lowest  temperature  that  is  possible  under  undisturbed  conditions.  We  seek  to 
reduce  the  temperature  at  the  drinking  water  wells  by  minimizing  a  quadratic  function  involving 
pumping  rates  at  a  set  of  barrier  wells,  which  is  an  approximate  measure  of  cost,  and  a  linear 
combination  of  pumping  rate  and  temperature  at  a  set  of  drinking  water  wells. 

Figure  2.2  shows  the  relative  locations  of  the  wells.  The  injection  well  is  the  square  at  the  far 
right,  the  barrier  wells  the  vertical  row  in  the  middle,  and  the  drinking  water  wells  are  the  array  on 
the  left. 

Numerical  experiments  show  that  a  steady-state  solution  is  obtained  after  eight  to  ten  years 
of  real  time.  Because  of  this  we  may  use  the  four  steady  state  pumping  rates  as  control  variables. 
For  the  work  reported  here  we  neglect  the  vertical  dimension  and  the  dependence  of  viscosity 
and  density  on  temperature.  These  assumptions  enable  us  to  decouple  the  equations  for  flow  and 
temperature  and  to  use  a  two-dimensional  simulator  for  each.  Given  the  controls,  we  can  solve  for 
the  flow  and  use  the  results  from  the  flow  code  to  compute  the  temperature  distribution. 

To  determine  the  flow  we  compute  the  piezometric  head  h  from 

dh 

(2.1)  S—  =  V  •  {BKVh)  +  q 

and  appropriate  initial/boundary  conditions.  In  (2.1),  S  is  the  storage  coefficient,  B(x.  y)  is  the 
thickness  of  the  aquifer,  K(x,y)  is  the  hydraulic  conductivity,  and  q  is  a  source  term.  From  the 
head  we  compute  the  mean  macroscopic  pore  velocity  vector  via 


(2.2) 


KS7h 

e 


where  e  is  the  effective  porosity  of  the  porous  medium. 
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FlG.  2.1.  Map  of  the  Site 


•  Drinking  water  well 


♦  Barrier  well 


Infiltration  well 


After  solving  the  flow  equation,  we  model  temperature  in  a  way  that  a  solute  transport  code 
can  be  used  to  solve  the  relevant  equations  [7]. 
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The  water  temperature  T  satisfies 

dT 

(2.3)  R—  =  V  •  (D  •  VT)  -  v  •  VT, 
where  the  thermal  retardation  factor  is 

(2.4)  R  =  1  + 

^aPa^a 

In  (2.3),  ea  =  0.3  is  the  volume  fraction  of  the  aqueous  phase,  es  =  0.7  is  the  volume  fraction  of 
the  solid  phase,  pscs  =  1.8 MJ/m3K  is  the  heat  capacity  of  the  soil,  and  paca  =  4.185 MJ/m3K 
is  the  heat  capacity  of  the  fluid.  For  saturated  flow,  e  =  ea. 

The  thermal  dispersion  tensor  is 

(2.5)  Dij  =  Pt\v\Sij  +  (A  - 

where  5^  is  the  Kronecker  5,  and  fti  =  10m  and  (3t  =  1  m  are  longitudinal  and  transversal  disper- 
sivity  values  that  are  characteristic  of  the  porous  medium.  D  is  a  nonsmooth  function  of  v,  and 
hence  of  u.  This  accounts  for  the  nonsmoothness  that  is  clearly  visible  in  Figure  1.1. 

We  formulate  the  optimization  problem  as 

(2.6)  min  J(u )  =  uTu  +  wtT. 

u&fl 

Here  u  G  jR4  is  the  vector  of  steady-state  pumping  rates  at  the  control  wells,  T  e  R4  is  the 
temperature  at  the  drinking  water  wells,  and 

w  =  (.0325,  .0119,  .0440,  ,0461)T  e  R 4 

is  a  vector  of  the  relative  pumping  rates  (in  m3 / sec)  at  these  wells.  The  truncation  error  in  the  flow 
and  transport  codes  contribute  low-amplitude  noise  to  J. 

The  bound  constraints  were  imposed  to  account  for  limits  in  the  pumping  rates.  These  con¬ 
straints  were  not  active  at  the  solution,  and  the  optimization  was  essentially  an  unconstrained 
problem. 

3.  Implicit  Filtering.  Implicit  filtering  [11, 15]  is  a  projected  quasi-Newton  iteration  which 
uses  difference  gradients,  reducing  the  difference  increment  as  the  optimization  progresses.  The 
method  was  designed  for  problems  with  objective  functions  that  are  small  perturbations  of  smooth 
functions.  Our  paradigm  is 

(3-1)  /  =  /,  +  0 

where  fs  is  smooth,  and  \  (f>{x )  |  is  small.  In  practice  (f)  is  usually  nonsmooth  and  sometimes  discon¬ 
tinuous. 

Implicit  filtering  is  a  sampling  method.  This  means  that  the  optimization  is  directed  only 
by  information  on  function  values,  with  no  gradient  information.  Implicit  filtering  differs  from 
classical  sampling  methods  such  as  the  Nelder-Mead  [17]  or  Hooke- Jeeves  [12]  algorithms  in  that 
it  is  readily  implemented  in  parallel  [3,4,6]  by  simply  performing  the  function  evaluations  needed 
for  the  difference  gradient  in  parallel.  The  potential  for  quasi-Newton  acceleration  [5,  11,  15] 
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is  a  feature  that  other  parallelizable  sampling  methods,  such  as  the  PDS  method  [8, 19,  20]  or 
DIRECT  [9, 13, 14],  cannot  exploit.  The  results  reported  in  this  paper  were  obtained  with  IFFCO, 
a  FORTRAN  implementation  of  implicit  filtering  [3]. 

Suppose  we  seek  to  solve 

(3.2)  min  f(x) 

x£il 

where 

(3.3)  O  =  {x  e  Rn  \Li  <  (x)i  <  Ui}. 

Here,  {Lj}-!=1  and  {Ui}f=1  are  sequences  of  real  numbers  such  that 

(3.4)  — oo  <  Li  <  Ui  <  +oo. 


Here  we  denote  the  ith  component  of  the  vector  x  by  (x)i  to  distinguish  the  component  index  from 
the  iteration  index.  We  denote  by  V  the  Z2  projection  onto  Q.  For  x  e  Rn 

Li  if  (x)i  <  Li 

(x)i  if  Li  <  (x)i  <  Ui 
U  if  {x)i  >  Ui 

Implicit  filtering  as  implemented  in  IFFCO  begins  by  scaling  0  to  the  unit  cube  (X,  =  0  and 
L\  —  1  for  all  i).  For  0  <  h  <  0.5,  let  V*/  denote  the  finite  difference  approximation  of  V/ 
with  step  size  h  that  uses  central  differences  if  all  points  of  the  central  difference  stencil  are  in  0 
and  one-sided  differences  in  those  directions  in  which  one  point  in  the  stencil  is  not  in  0.  The 
restriction  h  <  .5  implies  that  at  least  two  points  will  be  in  the  stencil  in  any  coordinate  direction 
(the  center  and  at  least  one  of  x  ±  he,  where  e  is  the  unit  vector  in  that  direction).  The  stencil  is 
used  both  to  approximate  the  gradient  and  to  provide  one  of  the  termination  criteria.  Fet  S(x,  h ) 
be  the  difference  stencil  about  x  in  Q  with  step  size  h.  We  call  the  condition 

(3.6)  f(x)  <  min  f(z) 

zEi>(x,h) 

stencil  failure .  In  the  unconstrained  case  [2, 15]  stencil  failure  implies  that  V  fs  —  0(h).  A  similar 
result  also  holds  in  the  bound  constrained  case,  where  stencil  failure  implies  that 

x  -  V(x  -  'Vfs(x))  =  0(h). 

We  terminate  the  quasi-Newton  iteration  for  a  given  value  of  h  after  a  stencil  failure  for  this  reason. 

IFFCO  offers  a  choice  of  SRI  and  BFGS  quasi-Newton  updates.  For  bound  constrained  prob¬ 
lems  we  recommend  the  SRI  update.  We  will  formally  describe  the  algorithm.  We  begin  with 
Algorithm  fdquasi,  which  is  a  finite  difference  projected  quasi-Newton  iteration  for  (3.2). 

Implicit  filtering  is  a  sequence  of  calls  to  fdquasi  with  the  difference  increments  or  scales 
reduced  after  each  return  from  fdquasi. 

There  are  several  convergence  theorems  for  implicit  filtering  [5, 11, 15].  We  state  a  typical 
result  from  [15]  for  completeness. 

THEOREM  3.1.  Let  f  satisfy  (3.1)  and  let  V/s  be  Lipschitz  continuous.  Let  hk  — >  0,  {xk}  be 
the  implicit  filtering  sequence,  and  Sk  =  Six,  hf).  Assume  that  fewer  than  amax  backtracks  are 
taken  for  all  but  finitely  many  k.  Then  if 

lim  (hk  +  h AT1  max  \(f>(z)\)  =  0 


(3.5) 


n*)i  = 


(3.7) 


Algorithm  1  fdquasifx.  f,  prnax.  r,  h.  amax ) 

p  =  1 

while p  <  pmax  and  ||x  —  V[x  —  Vhf(x))\\  >  rh  do 
compute  /  and  Vhf 
if  (3.6)  holds  then 

terminate  and  report  stencil  failure 
end  if 

update  the  model  Hessian  H  if  appropriate;  solve  Hd  =  —'Vhf{x) 
use  a  backtracking  line  search,  with  at  most  amax  backtracks,  to  find  a  step  length  A 
if  amax  backtracks  have  been  taken  then 
terminate  and  report  line  search  failure 
end  if 

x  •<—  V(x  +  A d) 

p  <-  p  +  1 

end  while 

if  p  >  pmax  report  iteration  count  failure 


Algorithm  2  imfilterf  i  ,  f.  pmax.  r,  { hk } ,  amax) 
for  k  =  0, ...  do 

fdquasi(a),  /,  pmax ,  r,  hk,  amax) 

end  for 


then  any  limit  point  of  the  sequence  {.zq:}  is  a  critical  point  of  fs. 

The  implicit  filtering  method  has  many  parameters,  the  sequence  of  scales,  the  termination 
parameter  r,  and  the  limits  amax  and  pmax  on  the  inner  and  outer  iterations.  We  will  discuss  our 
settings  of  those  parameters  in  §  4. 

The  most  significant  opportunity  for  parallelism  is  in  the  computation  of  V^/,  where  all  the 
function  evaluations  for  x  E  S(x,  h)  are  independent.  One  can  also  perform  the  line  search  func¬ 
tion  evaluations  in  parallel.  In  §  4.2  we  show  how  the  parallelism  can  be  effectively  exploited  by 
IFFCO. 

4.  Computational  Results.  The  computations  reported  in  this  section  were  done  on  the  IBM 
SP/2  supercomputer  located  at  the  North  Carolina  Supercomputer  Center  running  IBM  AIX  4.3. 
This  IBM  SP/2  consists  of  180  nodes,  where  each  node  consists  of  four  375  MHz  Power3-II 
processors.  Each  node  has  2  GB  of  memory.  We  used  the  IBM  xlf  7.1  FORTRAN  compiler. 

The  parameters  in  the  implicit  filtering  algorithm  were  r  =  1,  amax  =  5,  pmax  =  10, 
Ui  =  .04  for  i  =  1, . . . ,  4,  and  Lj  =  —  .04  for  i  =  1, . . . ,  4.  We  used  the  SRI  quasi-Newton  method 
and  imposed  a  limit  of  50  function  evaluations  on  the  optimization.  The  scales  were  hi  =  2~l 
for  1  <  i  <  9.  We  terminated  the  optimization  after  we  expended  the  budget  of  50  function 
evaluations. 

The  parallelism  was  in  the  simultaneous  evaluations  of  the  objective  function  to  form  the 
difference  gradients.  We  discretized  the  flow  equations  on  a  42  x  42  mesh  and  used  MODFLOW 
[16]  to  compute  the  piezometric  head.  From  the  head  we  extracted  the  velocity  vector  and  used 
MT3D  [18],  a  transport  simulator,  to  compute  the  temperature  distribution  on  the  mesh.  See  [1] 
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for  a  more  complete  account  of  the  model,  the  boundary  conditions,  and  the  underlying  physical 
assumptions.  We  computed  the  steady-state  solutions  using  accurate  temporal  integration  out  to  ten 
years.  For  the  flow  simulation  120  time  steps  of  30  days  are  taken.  The  transport  integration  was 
explicit,  and  we  took  150  transport  steps  for  each  flow  step.  MODFLOW  and  MT3D  communicate 
via  disk  I/O. 

4.1.  Effectiveness  of  the  Control.  In  Figures  4.1  and  4.2  we  plot  contours  of  temperature. 
We  normalize  the  10° C  temperature  of  the  groundwater  to  zero  and  the  15°C  temperature  of  the 
water  from  the  injection  well  to  one.  The  injection  well  is  located  at  the  box  on  the  right  side  of 
the  plume,  the  control  wells  at  the  vertical  row  of  diamonds  in  the  center  of  the  plume,  and  the 
drinking  water  wells  at  the  circles  to  the  left  of  the  control  wells.  The  temperature  of  the  injected 
water  is  5°C  warmer  that  the  ambient  groundwater  temperature  of  10°  C.  This  leads  to  an  increase 
of  l°C  at  the  drinking  water  wells  for  the  uncontrolled  flow,  to  high  satisfy  the  regulations. 

The  figures  clearly  show  that  the  optimized  pumping  rates  reduce  the  temperature  at  the  drink¬ 
ing  wells  and  that  the  size  of  the  high  temperature  plume  has  been  reduded.  The  maximum  tem¬ 
perature  at  the  drinking  water  wells  is  10.1°C  for  the  controlled  flow,  which  is  within  regulatory 
limits. 


FlG.  4.1.  Temperature  Distribution:  Uncontrolled  Flow 
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4.2.  Parallel  Performance.  As  we  described  earlier  in  §  3,  there  are  two  opportunities  for  par¬ 
allelism  in  IFFCO,  the  evaluation  of  the  gradient  and  the  line  search.  We  exploit  these  possibilities 
in  our  implementation  by  using  the  PVM  parallel  programming  library. 

The  processors  on  each  node  share  2  gigabytes  of  memory  (which  did  not  affect  our  compu¬ 
tations)  and  a  local,  temporary  directory.  We  used  this  temporary  directory  for  the  data  files  and 
temporary  files  we  needed  in  our  simulation.  Since  four  processors  shared  the  same  local  direc¬ 
tory,  we  added  a  unique  task  identification  number  (TID)  to  each  each  temporary  file  to  prevent 
the  different  processors  from  writing  to  the  same  file. 

The  PVM  programming  library  leads  to  the  use  of  the  master-slave  parallel  programming 
paradigm.  In  the  computation  a  master  processor  did  all  the  work  in  IFFCO  except  for  the  function 
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FlG.  4.2.  Temperature  Distribution:  Controlled  Flow 


Number  of  Processors  Run-time  (in  sec.)  Speed  up 


1 

5582.89 

1.0000 

2+1 

2986.31 

1.8695 

4+1 

1618.64 

3.4491 

8+1 

1050.15 

5.3163 

Table  4.1 

Paralell  efficiency 

evaluations.  The  time  needed  to  do  this  was  much  smaller  than  the  time  needed  to  evaluate  a 
function.  Therefore,  we  used  the  master  to  run  IFFCO  and  used  both  the  master  and  the  slaves 
to  do  the  function  evaluations  needed  during  the  evaluation  of  the  gradient  and  the  line  search. 
We  only  needed  to  send  short  messages  between  the  master  and  the  slaves,  so  the  communication 
times  were  very  small  compared  to  the  computations.  This  means  we  only  needed  to  use  the  basic 
send  and  receive  mechanisms  provided  by  PVM. 

The  PVM  implementation  available  on  the  IBM  SP/2  needed  a  dedicated  processor  to  run  the 
PVM  server.  In  Table  4. 1  we  show  the  times  needed  to  solve  the  problem  with  different  numbers 
of  processors.  We  record  the  number  of  processors  as,  for  example,  2  +  1  to  emphasize  that  one 
processor  was  needed  as  the  PVM  server  (a  characteristic  of  the  IBM  SP/2  PVM).  The  last  column 
of  the  table  shows  the  speedup  factor 


where  T\  is  the  time  needed  with  one  processor  and  I]  is  the  time  needed  with  i  +  1  processors. 
Perfect  speedup  for  our  configuration  would  be  Si  =  i. 

Note  that  it  does  not  make  sense  for  this  problem  to  use  more  than  nine  processors.  At  most 
eight  processors  are  required  for  evaluating  the  gradient,  and  one  is  required  as  the  PVM  server. 
Table  4.1  shows  good  parallel  performance. 
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